Evaluating and Improving the Extraction of Mathematical Identifier Definitions

نویسندگان

  • Moritz Schubotz
  • Leonard Krämer
  • Norman Meuschke
  • Felix Hamborg
  • Bela Gipp
چکیده

Mathematical formulae in academic texts significantly contribute to the overall semantic content of such texts, especially in the fields of Science, Technology, Engineering and Mathematics. Knowing the definitions of the identifiers in mathematical formulae is essential to understand the semantics of the formulae. Similar to the sense-making process of human readers, mathematical information retrieval systems can analyze the text that surrounds formulae to extract the definitions of identifiers occurring in the formulae. Several approaches for extracting the definitions of mathematical identifiers from documents have been proposed in recent years. So far, these approaches have been evaluated using different collections and gold standard datasets, which prevented comparative performance assessments. To facilitate future research on the task of identifier definition extraction, we make three contributions. First, we provide an automated evaluation framework, which uses the dataset and gold standard of the NTCIR-11 Math Retrieval Wikipedia task. Second, we compare existing identifier extraction approaches using the developed evaluation framework. Third, we present a new identifier extraction approach that uses machine learning to combine the well-performing features of previous approaches. The new approach increases the precision of extracting identifier definitions from 17.85% to 48.60%, and increases the recall from 22.58% to 28.06%. The evaluation framework, the dataset and our source code are openly available at: https://ident.formulasearchengine.com.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mapping CRC Card into Stochastic Petri Net for Analyzing and Evaluating Quality Parameter of Security (TECHNICAL NOTE)

CRC cards are unconventional method for identifying and describing classes, behavior and its responsibilities and collaborators of class. Representation of three categories of class, responsibilities and collaborators can give proper image of scenario. These cards are effective method for analyzing scenarios. With all positive features of CRC cards, of weaknesses of these cards are failure to s...

متن کامل

Optimization of Colchicine Extraction from Colchicum Kurdicum (Bornm.) Stef. Corm and Evaluating Anti-Inflammatory and Anti-Oxidant Activities of the Plant Extract

Background and purpose: Colchicum kurdicum (Bornm.) Stef. is a monocotyledon plant which is endemic to Iran. The corm and seeds of this plant have some bioactive compounds, especially tropolone alkaloids that are used in treatment of inflammations, rheumatoid arthritis, gout, joint pains, and cancers. This study aimed at optimization of colchicine extraction from the corms of C. kurdicum and ev...

متن کامل

Extracting Definitions of Mathematical Expressions in Scientific Papers

Natural language definitions of mathematical expressions are essential for understanding the mathematical content of scientific papers. A textual description corresponding to a mathematical expression determines the type of symbol or function and the specific name for reference. Our objective is to create an automatic way of extracting definitions of mathematical expressions. We needed to creat...

متن کامل

Mass Transfer Mechanism and Mathematical Model for Extraction Process of L-Theanine across Bulk Liquid Membrane

This paper deals with the extraction of L-Theanine containing Aliquat 336 as a carrier and cyclohexane as solvent across bulk liquid membrane. The optimum operation condition are as follows: extraction time of 150 min, initial concentration of theanine in the feed phase is 1.8 g/L, the carrier concentration is 0.5 M, the ion of the receiving phas...

متن کامل

A Mathematical Model for Evaluating the Efficiency of the University of Kashan’s Faculties

Efficiency evaluation of units has been of interest since many years in different domains such as management, economy, business, banking, and many others. Data envelopment analysis is one of the popular operations research methods for measuring the relative efficiency of units, which use multiple inputs to produce multiple outputs. As we know, universities play a key role in many aspects of a c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017